Merged
Conversation
Contributor
Author
|
Note that I'm halving the MD5 sum here, reducing it to 8 bytes (64) bits instead of 16 bytes. This gives us a chance of collision of one in sqrt(2⁶⁴), which is roughly one in four billion (1/4294967296). A hosts file containing that many hosts would be more than a gigabyte. |
Contributor
Author
|
Out of curiosity, I also tried using I'll merge this, seems like nothing broke. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Instead of keeping a map of hosts, we can hash them and keep the hash sums to save space. Hashes are arrays, which means a fixed size, which is easier to manage and behaves more predictably wrt memory consumption.
After some profiling with the hosts file from http://hosts-file.net/download/hosts.txt, I noticed the following:
string[Mb][8]byte[Mb]The runtime for parsing the entire host file did not change noticeably (it stayed around 3.4 seconds).
So far the only downside of this approach seems to be that there is no way to list blocked hosts in the web UI. Because of that, we can only use this approach for the public hosts files (which are the huge ones anyway, private hosts files will likely be much smaller).
Closes #9, for now (we could still investigate that later).